Tracking Hot-k Items over Web 2.0 Streams
نویسندگان
چکیده
The rise of the Web 2.0 has made content publishing easier than ever. Yesterday’s passive consumers are now active users who generate and contribute new data to the web at an immense rate. We consider evaluating data driven aggregation queries which arise in Web 2.0 applications. In this context, each user action is interpreted as an event in a corresponding stream e.g., a particular weblog feed, or a photo stream. The presented approach continuously tracks the most popular tags attached to the incoming items and based on this, constructs a dynamic top-k query. By continuous evaluation of this query on the incoming stream, we are able to retrieve the currently hottest items. To limit the query processing cost, we propose to pre-aggregate index lists for parts of the query which are later on used to construct the full query result. As it is prohibitively expensive to materialize lists for all possible combinations, we select those tag sets that are most beneficial for the expected performance gain, based on predictions leveraging traditional FM sketches. To demonstrate the suitability of our approach, we perform a performance evaluation using a real-world dataset obtained from a weblog crawl.
منابع مشابه
Monitoring frequent items over distributed data streams
MONITORING FREQUENT ITEMS OVER DISTRIBUTED DATA STREAMS Robert H. Fuller April 3, 2007 Many important applications require the discovery of items which have occurred frequently. Knowledge of these items is commonly used in anomaly detection and network monitoring tasks. Effective solutions for this problem focus mainly on reducing memory requirements in a centralized environment. These solution...
متن کاملCounting Distinct Items over Update Streams
We present two novel algorithms for tracking the number of distinct items over high speed data streams consisting of insertion and deletion operations that improves on the space and time complexity of existing algorithms.
متن کاملDistributed Monitoring of Frequent Items
Monitoring frequently occuring items is a recurring task in a variety of applications. Although a number of solutions have been proposed there has been few to address the problem in a distributed networked environment. Most past solutions relied upon approximating results to lower communication overhead. In this paper we introduce a new algorithm designed for continuously tracking frequent item...
متن کاملViral, Quality, and Junk Videos on YouTube: Separating Content from Noise in an Information-Rich Environment
With the rise of web 2.0 there is an ever-expanding source of interesting media because of the proliferation of usergenerated content. However, mixed in with this is a large amount of noise that creates a proverbial “needle in the haystack” when searching for relevant content. Although there is hope that the rich network of interwoven metadata may contain enough structure to eventually help sif...
متن کاملExperiences from Implementing Collaborative Filtering in a Web 2.0 Application
The goal of this paper is to report our experiences from integrating item-based collaborative filtering into the Web 2.0 site linkfun.net. We discuss the necessary steps to implement the selected Slope One algorithm in our real world application. It was necessary to conduct performance optimization to allow for recommendations without any delays in page generation on our site. Firstly, we signi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011